STABILITY OF FAST ALGORITHMS FOR 
STRUCTURED LINEAR SYSTEMS* 

RICHARD P. BRENTt 

Abstract. We survey the numerical stability of some fast algorithms for solving systems of linear 

equations and linear least squares problems with a low displacement-rank structure. For example, 

the matrices involved may be Toeplitz or Hankel. We consider algorithms which incorporate pivoting 

^^v , without destroying the structure, and describe some recent results on the stability of these algorithms. 

— s^ , We also compare these results with the corresponding stability results for the well known algorithms 

■ of Schur/Bareiss and Levinson, and for algorithms based on the semi-normal equations. 
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)[^ . 1. Motivation. The standard direct method for solving dense n x n systems of 

^H ' linear equations is Gaussian elimination with partial pivoting. The usual implemen- 

r^ . tation requires of order n^ arithmetic operations. 

j^ , In practice, linear systems often arise from some physical system and have a 

structure which is a consequence of the physical system. For example, time-invariant 
physical systems often give rise to Toeplitz systems of linear equations (i j3.2l) . An 
n X n Toeplitz matrix is a dense matrix because it generally has n^ nonzero elements. 
However, it is determined by only 0{n) parameters (in fact, by the 2n— 1 entries in its 
first row and column). Similar examples are Hankel, Cauchy, Toeplitz-plus-Hankel, 

f~^ [ and Vandermonde matrices [351 US • 

When solving such a structured linear system it is possible to ignore the structure, 
and this may have advantages if standard software is available and n is not too large. 

lO ' However, if n is large or if many systems have to be solved, perhaps with real-time 

constraints (e.g. in radar and sonar applications), then it is desirable to take advantage 
of the structure. The primary advantage to be gained is that the time to solve a linear 
system is reduced by a factor of order n to 0{n^). Storage requirements may also be 
reduced by a factor of order n, from O(n^) to 0{n). 

Most papers concerned with algorithms for structured linear systems concentrate 
on the speed (usually measured in terms of the number of arithmetic operations 
required) and ignore questions of numerical accuracy. However, it is dangerous to use 
fast algorithms without considering their numerical properties. There is no point in 
obtaining an answer quickly if it is much less accurate than is justified by the data. 

In this paper we reverse the usual emphasis on speed, and concentrate on the 
numerical properties of fast algorithms. Because there are many classes of structured 
matrices, and an ever increasing number of fast algorithms, we can not attempt to be 
comprehensive. Our aim is to introduce the reader to the subject, illustrate some of 
the main ideas, and provide pointers to the literature. 
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The subject of numerical stability/instability of fast algorithms is confused for 
several reasons: 

1. Structured matrices are often very ill-conditioned. For example, the Hilbert 
matrix |i50, . defined by aij = l/(i + j — 1), is often used as an example of a poorly 
conditioned matrix [29 . This is a typical example of a Hankel matrix. Reversing the 
order of the rows gives a Toeplitz matrix. 

2. The solution might be less sensitive to structured perturbations (i.e. per- 
turbations which are physically plausible because they preserve the structure) than 
to general (unstructured) perturbations. The effect of rounding errors in methods 
which ignore the structure is generally equivalent to the introduction of unstructured 
perturbations. Ideally we should use a method which only introduces structured per- 
turbations, but this property often does not hold, or is difficult to prove, even for 
methods which take advantage of the structure to reduce the number of arithmetic 
operations. 

3. The error bounds which can be proved are usually much weaker than what 
is observed on "real" or "random" examples. Thus, methods which are observed to 
work well in practice can not always be guaranteed, and it is hard to know if it is just 
the analysis which is weak or if the method fails in some rare cases. 

4. An algorithm may perform well on special classes of structured matrices, 
e.g. positive definite matrices or matrices with positive reflection coefficients, but 
perform poorly or break down on broader classes of structured matrices. 

1.1. Outline. Different authors have given different (and sometimes inconsis- 
tent) definitions of stability and weak stability. We shall follow Bunch [HI [15]. For 
completeness, our definitions are given in f}2l 

The concept of displacement rank, discussed in ^ may be used to unify the 
discussion of many algorithms for structured matrices [52l |53l [54] . It is well known 
that systems of n linear equations with a low displacement rank (e.g. Toeplitz or 
Hankel matrices) can be solved in 0{n^) arithmetic operations. Asymptotically faster 
algorithms with time bound 0(n log^ n) exist [13], but are not considered here because 
their numerical properties are generally poor and the constant factors hidden in the 
"O" notation are large [1]. 

For positive definite Toeplitz matrices, the first 0{n^) algorithms were introduced 
by Kolmogorov [SB], Wiener [75] and Levinson j57j. These algorithms are related to 
recursions of Szego |74| for polynomials orthogonal on the unit circle. Another class 
of 0{n^) algorithms, e.g. the Bareiss algorithm [2], are related to Schur's algorithm 
for finding the continued fraction representation of a holomorphic function in the unit 
disk [64j . This class can be generalized to cover unsymmetric matrices and more 
general "low displacement rank" matrices [SJ. In §2J]S]we consider the numerical 
stability of some of these algorithms. The GKO-Cauchy and GKO-Toeplitz algorithms 
are discussed in 21 The Schur/Bareiss algorithm for positive definite matrices is 
considered in ^ 35. 11 and generalized Schur algorithms are mentioned in ^ 35. 21 In S[6]we 
consider fast orthogonal factorization algorithms and the fast solution of structured 
least squares problems. 

Algorithms for Vandermonde and many other classes of structured matrices are 
not considered here - we refer to the extensive survey by Olshevsky [60]. Also, we have 
omitted any discussion of fast "lookahead" algorithms [18l[l9l|30ll3U|42l|43l|44l|45l[72] 
because, although such algorithms often succeed in practice, in the worst case they 
require of order n^ operations. 
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Much work has been done on iterative methods for Toephtz and related systems. 
Numerical stability is not a major problem with iterative methods, but the speed 
of convergence depends on a good choice of preconditioner. We do not consider 
iterative methods here, except to mention iterative refinement as a way of improving 
the accuracy of weakly stable direct methods. 

1.2. Notation. In the following, R denotes a structured matrix, T is a Toeplitz 
or Toeplitz-type matrix, P is a permutation matrix, L is lower triangular, U is upper 
triangular, and Q is orthogonal. In error bounds, Onis) means 0{ef{n)), where f{n) 
is a polynomial in n. We do not usually try to specify the polynomial /(n) precisely 
because it depends on the norm that is used to measure the error and on many 
unimportant details of the implementation. 

2. Stability and Weak Stability. In this section we give definitions of stability 
and weak stability of algorithms for solving linear systems. 

Consider algorithms for solving a nonsingular, nx n linear system Ax = b. There 
are many definitions of numerical stability in the literature, for example O |4j |8l [Q] 
m [Ml IMl El [SH ISl ISS] . Definitions [O and O below are taken from Bunch [H] . 

Definition 2.1. An algorithm for solving linear equations is stable for a class 
of matrices A if for each A in A and for each b the computed solution x to Ax — b 
satisfies Ax — b, where A is close to A and b is close to b. 

Definition 12.11 says that, for stability, the computed solution has to be the exact 
solution of a problem which is close to the original problem. This is the classical 
backward stability of Wilkinson [79l [801 HI] • We interpret "close" to mean close in the 
relative sense in some norm, i.e. 

\\A-A\\/\\A\\=On{e), \\b-b\\/\\b\\^On{e). 

Note that the matrix A is not required to be in the class A. For example, A 
might be the class of nonsingular Toeplitz matrices, but A is not required to be a 
Toeplitz matrix. If we require A € A we get what Bunch 15, calls strong stability. 
For a discussion of the difference between stability and strong stability for Toeplitz 
algorithms, see [34 l l35 l l48 l [76] . 

Stability does not imply that the computed solution x is close to the exact solution 
a;, unless the problem is well-conditioned. Provided ne is sufficiently small, stability 
implies that 

(2.1) \\x-x\\/\\x\\^Or.{^e). 

For more precise results, see Bunch [15] and Wilkinson [79] . 

As an example, consider the method of Gaussian elimination. Wilkinson |79] 
shows that 

||A-A||/||A||=0„(5e), 

where g = g{n) is the "growth factor", g depends on whether partial or complete 
pivoting is used. In practice g is usually moderate, even for partial pivoting. However, 
a well-known example shows that g{n) = 2"^^ is possible for partial pivoting, and 
it has been shown that examples where g{n) grows exponentially with n may arise 
in applications, e.g. for linear systems arising from boundary value problems. Even 
for complete pivoting, it has not been proved that g{n) is bounded by a polynomial 
in n. Wilkinson [75] showed that g{n) < „(iog")/4+o(i)^ ^j^^ Gould \W showed that 
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g{n) > n is possible for n > 12; there is still a large gap between these results. Thus, 
to be sure that Gaussian elimination satisfies Definition 12. 1[ we must restrict A to 
the class of matrices for which g is 0„(1). In practice this is not a problem, because 
g can easily be checked a posteriori |81) . 

Although stability is desirable, it is more than we can prove for many useful algo- 
rithms. Thus, following Bunch |15) . we define the (weaker, but still useful) property 
of weak stability. 

Definition 2.2. An algorithm for solving linear equations is weakly stable for a 
class of matrices A if for each well- conditioned A in A and for each b the computed 
solution X to Ax — b is such that \\x — a;||/||a;|| is small. 

In Definition l2.2[ we take "small" to mean 0„(e), and "well-conditioned" to mean 
that k{A) is 0„(1), i.e. is bounded by a polynomial in n. From (|2.ip . stability implies 
weak stability. 

Define the residual r = Ax — b. It is well-known [50] that 

1 Ikll lli — x\\ \\r\\ 

Thus, for well-conditioned A, \\x — x||/||a;|| is small if and only if |lf|l/|l&|l is small. 
This observation clearly leads to an alternative definition of weak stability: 

Definition 2.3. An algorithm for solving linear equations is weakly stable for a 
class of matrices A if for each well- conditioned A in A and for each b the computed 
solution X to Ax — b is such that \\Ax — b\\/\\b\\ is small. 

2.1. Example: orthogonal factorization. To illustrate the concepts of stabil- 
ity and weak stability, consider computation of the Cholesky factor R of A'^A, where 
A is an mx n matrix of full rank n. A good 0{mn^) algorithm is to compute the QR 
factorization 

A^QR 

of A using Householder or Givens transformations [38^ . It can be shown "80] that the 
computed matrices Q, R satisfy 

(2.3) A = QR 

where Q^Q = I, Q is close to Q, and A is close to A. Thus, the algorithm is stable 
in the sense of backward error analysis. Note that ||^"^^ — i?"^i?||/||A^A|j is small, 
but IIQ — Q\\ and \\R — i?||/||-R|| are not necessarily small. Bounds on \\Q — Q\\ and 
11^— i?|j/|ji?|| depend on k, and are discussed in [551 [551 [5T] . 

A different algorithm is to compute (the upper triangular part of) A^A, and then 
compute the Cholesky factorization of A'^A by the usual (stable) algorithm. The 
computed result R is such that R^R is close to A^A. However, this does not imply 
the existence of A and Q such that p.3p holds (with A close to A and some Q with 
Q^Q = I) unless A is well-conditioned [67]. By analogy with Definition 12.31 above, 
we may say that Cholesky factorization of A^A gives a weakly stable algorithm for 
computing R, because the "residual" A^A — R^R is small. 

3. Classes of Structured Matrices. Structured matrices R satisfy a Sylvester 
equation which has the form 

(3.1) V{^^,^,}(i?)-A/i?-i?A = $*, 
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where Af and Ai, have some simple structure (usually banded, with 3 or fewer full 
diagonals), $ and ^ are n x a and a x n respectively, and a is some fixed integer. 
The pair of matrices ($, \1/) is called the {Af, Ai,}- generator of R. 

a is called the {Af, Ai,}- displacement rank of R. We are interested in cases where 
a is small (say at most 4). 

For a discussion of the history and some variants of p.ip , see [33] and the refer- 
ences given there, in particular jmi53) . 

3.1. Cauchy and Cauchy-type matrices. Particular choices of Af and Ai, 
lead to definitions of basic classes of matrices. Thus, for a Cauchy matrix 



C(t,s) 



1 



f'i Sj 



we have 



and 



Af = Dt = diag(ti,t2,...,tn) , 



Ab^ Ds ^ diag(si,S2,...,s„) 



$ =* = [1,1, ■•■,!] • 

As a natural generalization, we can take $ and ^ to be any rank-a matrices, with 
Af,Ah as above. Then a matrix R satisfying the Sylvester equation p.l|) is said to 
be a Cauchy-type matrix. 

3.2. Toeplitz matrices. For a Toeplitz matrix T = [tij] ~ [oi-j], we take 



A, 





1 

1 



1 




1 



, Ab^Z^i 





1 

1 



-1 




1 



and 



$ 



1 ••• 

oo ai^n + CLi ■■■ a-i+an-1 



-\T 



* 



••• 1 



We can generalize to Toeplitz-type matrices by taking $ and '^ to be general rank-a 
matrices. 
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4. Structured Gaussian Elimination. Let an input matrix, i?i, have the 
partitioning 



Ri = 



yi ^1 



The first step of normal Gaussian elimination is to premultiply Ri by 



which reduces it to 



where 



-Yi/di 



di wj 

i?2 



R2 



-Ri - Yi^T/di 



is the Schur complement of di in i?i. At this stage, Ri has the factorization 
-Ri 



1 



0^ 

/ 



i?2 



One can proceed recursively with the Schur complement _R2j eventually obtaining a 
factorization i?i = LU . 

The key to structured Gaussian elimination is the fact that the displacement struc- 
ture is preserved under Schur complementation, and that the generators of the Schur 
complement Rk+i can be computed from the generators of R)~ in 0{n) operations. 

Row and/or column interchanges destroy the structure of matrices such as Toeplitz 
matrices. However, if Af is diagonal (which is the case for Cauchy and Vandermonde 
type matrices), then the structure is preserved under row permutations. 

This observation leads to the GKO-Cauchy algorithm of Gohberg, Kailath and 
Olshevsky [33^ for fast factorization of Cauchy-type matrices with partial pivoting, and 
many recent variations on the theme by Boros, Gohberg, Ming Gu, Heinig, Kailath, 
Olshevsky, M. Stewart, et ah see [101 123 |40l Hi |60l [68] . 

4.1. The GKO-Toeplitz algorithm. Heinig J6^ showed that, if T is a Toeplitz- 
type matrix, then 



R = FTD-^F* 



is a Cauchy-type matrix, where 



F = 



1 



^ 2^T^{k-l)U-l)/n^ 

[f- Jl<A:j<n 



is the Discrete Fourier Transform matrix. 



D = diag(l, e'^*/", . . . , e"("-i)/"). 



and the generators of T and R are simply related. 
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The transformation T ^ Ris perfectly stable because F and D are unitary. Note 
that R is (in general) complex even if T is real. This increases the constant factors in 
the time bounds, because complex arithmetic is required. 

Heinig's observation was exploited by Gohberg, Kailath and Olshevsky 33 : R 
can be factorized as i? = P^LU using GKO-Cauchy. Thus, from the factorization 

T = F*P^LUFD , 

a linear system involving T can be solved in 0{n^) operations. The full procedure 
of conversion to Cauchy form, factorization, and solution requires 0{n'^) (complex) 
operations. 

Other structured matrices, such as Hankel, Toeplitz-plus-Hankel, Vandermonde, 
Chebyshev-Vandermonde, etc, can be converted to Cauchy-type matrices in a similar 
way. 

4.2. Error Analysis. Because GKO-Cauchy and GKO-Toeplitz involve partial 
pivoting, we might guess that their stability would be similar to that of Gaussian 
elimination with partial pivoting. Unfortunately, there is a flaw in this reasoning. 
During GKO-Cauchy the generators have to be transformed, and the partial pivoting 
does not ensure that the transformed generators are small. 

Sweet and Brent [73] show that significant generator growth can occur if all the el- 
ements of $^ are small compared to those of |$| |^|. This can not happen for ordinary 
Cauchy matrices because $'^'^-' and ^^'^^ have only one column and one row respec- 
tively. However, it can happen for higher displacement-rank Cauchy-type matrices, 
even if the original matrix is well-conditioned. 

4.3. The Toeplitz Case. In the Toeplitz case there is an extra constraint on 
the selection of $ and ^, but it is still possible to give examples where the normalized 
solution error grows like k^ and the normalized residual grows like k, where k is the 
condition number of the Toeplitz matrix. Thus, the GKO-Toeplitz algorithm is (at 
best) weakly stable. 

It is easy to think of modified algorithms which avoid the examples given by 
Sweet and Brent, but it is difficult to prove that they are stable in all cases. Stability 
depends on the worst case, which may be rare and hard to find by random sampling. 

The problem with the original GKO algorithm is growth in the generators. Ming 
Gu suggested exploiting the fact that the generators are not unique. Recall the 
Sylvester equation (13. ip . Clearly we can replace $ by $Af and ^I' by M^^'i', where 
M is any invertible a x a matrix, because this does not change the product $\l/. 
Similarly at later stages of the GKO algorithm. 

Ming Gu [40] proposes taking M to orthogonalize the columns of $ (that is, at 
each stage perform an orthogonal factorization of the generators). M. Stewart [68] 
proposes a (cheaper) LU factorization of the generators. In both cases, clever pivoting 
schemes give error bounds analogous to those for Gaussian elimination with partial 
pivoting. 

4.4. Gu and Stewart's error bounds. The error bounds obtained by Ming 
Gu and M. Stewart involve a factor K^, where K depends on the ratio of the largest 
to smallest modulus elements in the Cauchy matrix 

1 
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Although this is unsatisfactory, it is similar to the factor 2"^^ in the error bound 
for Gaussian elimination with partial pivoting. In practice, the latter factor is ex- 
tremely pessimistic, which explains why Gaussian elimination with partial pivoting 
is popular [49]. Perhaps the bounds of Ming Gu and M. Stewart are similarly pes- 
simistic, although practical experience is not yet extensive enough to be confident 
of this. M. Stewart }68j gives some interesting numerical results which indicate that 
his scheme works well, but more numerical experience is necessary before a definite 
conclusion can be reached. 

4.5. A general strategy for guaranteed results. It often happens that there 
is a choice of 

1. A fast algorithm which usually gives an accurate result, but occasionally fails 
(or at least can not be proved to succeed every time), or 

2. An algorithm which is guaranteed to be stable but is slow. 

In such cases, a good strategy is to use the fast algorithm, but then check the normal- 
ized residual \\Ax — b\\/{\\A\\ ■ \\x\\, where x is the computed solution of the system 
Ax = b. If the residual is sufficiently small we can accept i as a reasonably accurate 
solution. In the rare cases that the residual is not sufficiently small we can can use 
the slow but stable algorithm (alternatively, if the residual is not too large, one or 
two iterations of iterative refinement may be sufficient and faster [51[ I81|). 

An example of this general strategy is the solution of a Toeplitz system by Ming 
Gu or M. Stewart's modification of the GKO algorithm. We can use the 0{n^) 
algorithm, check the residual, and resort to iterative refinement or a stable 0{n^) 
algorithm in the (rare) cases that it is necessary. Computing the residual takes only 
0(n log n) arithmetic operations. 

5. Positive Definite Structured Matrices. An important class of algorithms, 
typified by the algorithm of Bareiss [^ , find an LU factorization of a Toeplitz matrix T, 
and (in the symmetric case) are related to the classical algorithm of Schur [171 1321 IM] • 

It is interesting to consider the numerical properties of these algorithms and 
compare with the numerical properties of the Levinson algorithm (which essentially 
finds an LU factorization of T~^). 

5.1. The Bareiss algorithm for positive definite matrices. Bojanczyk, 
Brent, de Hoog and Sweet (abbreviated BBHS) have shown in [8l [70] that the nu- 
merical properties of the Bareiss algorithm are similar to those of Gaussian elimina- 
tion {without pivoting). Thus, the algorithm is stable for positive definite symmetric 
Toeplitz matrices. 

The Levinson algorithm can be shown to be weakly stable for bounded n, and 
numerical results by Varah [771, BBHS and others suggest that this is all that we 
can expect. Thus, the Bareiss algorithm is (generally) better numerically than the 
Levinson algorithm. 

Cybenko [24l showed that if certain quantities called "reflection coefficients" are 
positive then the Levinson-Durbin algorithm for solving the Yule- Walker equations 
(a positive-definite system with special right-hand side) is stable. Unfortunately, 
Cybenko's result is not usually applicable, because most positive-definite Toeplitz 
matrices do not usually satisfy the restrictive condition on the reflection coefficients. 
Some relevant numerical examples are given in [8]. 

5.2. The generalized Schur algorithm. The Schur algorithm can be gener- 
alized to factor a large variety of structured matrices [SS] [53]. For example, suitably 
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generalized Schur algorithms apply to block Toeplitz matrices, Toeplitz block raatri- 
ces, and to matrices of the form T^'^T, where T is rectangular Toeplitz. 

It is natural to ask if the stability results of BBHS (which are for the classical 
Schur/Bareiss algorithm) extend to the generalized Schur algorithm. This was con- 
sidered by Chandrasekharan and Sayed [20] and M. Stewart and Van Dooren [69] . 
The displacement structure considered in [20' is more general than that considered 
in [69] . which in turn is more general than that considered in [8, 70 1. Each increase 
in generality complicates the analysis and seems to make the stability result more 
dependent on details of the method of implementing the hyperbolic rotations which 
occur in the algorithms. The interested reader is referred to [20, 69[ for details. 

The overall conclusion is that the generalized Schur algorithm is stable for positive 
definite symmetric (or Hermitian) matrices, provided the hyperbolic transformations 
in the algorithm are implemented correctly. 

6. Fast Orthogonal Factorization. In an attempt to achieve stability without 
pivoting, and to solve mx n least squares problems {m > n), it is natural to consider 
algorithms for computing an orthogonal factorization 

(6.1) T = QU 

of an 771 X n Toeplitz matrix T. We assume that T has full rank n. For simplicity, in 
the time bounds we assume m — 0{n) to avoid functions of both m and n. 

The first 0{n^) (more precisely, 0{mn)) algorithm for computing the factoriza- 
tion (|6.ip was introduced by Sweet [7D]. Unfortunately, Sweet's algorithm is unstable: 
see Luk and Qiao [58] . 

Other 0(ri^) algorithms for computing the matrices Q and U or U~^ were given by 
Bojanczyk, Brent and de Hoog (abbreviated BBH) 6 , Chun et al [23], Cybenko [25] , 
and Qiao [52], but none of them has been shown to be stable, and in several cases 
examples show that they are unstable. 

It may be surprising that fast algorithms for computing an orthogonal factoriza- 
tion (|6.1I) are unstable. The classical 0{n^) algorithms are stable because they form 
Q as a product of elementary orthogonal matrices (usually Givens or Householder 
matrices [36l [38l [81] ) . Unlike the classical algorithms, the 0{n^) algorithms do not 
form Q in a numerically stable manner as a product of matrices which are (close to) 
orthogonal. This observation explains both their speed and their instability! 

For example, the algorithms of BBH [6j and Chun et al [23] depend on Cholesky 
downdating, and numerical experiments show that they do not give a Q which is close 
to orthogonal. This is not too surprising, because Cholesky downdating is known to 
be a sensitive numerical problem [S] I67j . 



6.1. Use of the semi-normal equations. It can be shown that, provided the 
Cholesky downdates are implemented in a certain way (analogous to the condition 
for the stability of the generalized Schur algorithm), the BBH algorithm computes U 
in a weakly stable manner [7]. In fact, the computed upper triangular matrix U is 
about as good as can be obtained by performing a Cholesky factorization of T'^T, so 

||T^T-C7^t/||/||T^T|j=0,„(£). 

Thus, by solving 

U^Ux = T'^b 
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(the so-called semi-normal equations) we have a weakly stable algorithm for the so- 
lution of general Toeplitz systems Tx ~ b in 0{n?) operations. The solution can be 
improved by iterative refinement if desired [371. The computation of Q is avoided, 
and the algorithm is applicable to full-rank Toeplitz least squares problems. The 
disadvantage of this method is that, by implicitly forming T'^T, the condition of the 
problem is effectively squared. If the condition number k = k{T) is in the range 

1 1 

^ < K < - 

Ve £ 

then it will usually be impossible to get any significant figures in the result (iterative 
refinement may fail to converge) without reverting to a slow but stable orthogonal 
factorization algorithm. One remedy is to use double-precision arithmetic, i.e. replace 
£ by e^ , but this may be difficult if e already corresponds to the maximum precision 
implemented by hardware. 

Another way of computing the upper triangular matrix U, but not the orthogonal 
matrix Q, in (|6.ip . is to apply the generalized Schur algorithm to T^T. This method 
also squares the condition number. 

6.2. Computing Q stably. It seems difficult to give a satisfactory 0{n^) algo- 
rithm for the computation of Q in the factorization (16. ip . Using a modification of the 
embedding approach pioneered by Chun and Kailath [331 [52] , Chandrasekharan and 
Sayed [21] give a stable algorithm to compute a factorization 

(6.2) T ^ LQU 

where L is lower triangular. (Of course, the factorization (|6.2p is not unique.) The 
algorithm of |21| can be used to solve linear equations, but not least squares problems 
(because T has to be square, and in any case the matrix Q in (|6.2p is different from 
the matrix Q in (16. ip ). Because the algorithm involves embedding the n x n matrix 
T in a 2n X 2n matrix 

T 

the constant factors in the operation count are large: 59n^-t-0(nlogn), which should 
be compared to 8n^ -I- O(nlogn) for BBH and the semi-normal equations. (These 
operation counts apply for m — n: see [^ for operation counts of various algorithms 
when m > n.) Thus, although the embedding approach is elegant and leads to 
interesting (and in some cases stable) 0{n^) algorithms, a penalty is the significant 
increase in the constant factors. This is analogous to the penalty paid by the GKO 
algorithm because of the introduction of complex arithmetic. 

7. Conclusions. Although this survey has barely scratched the surface, we hope 
that the reader who has come this far is convinced that questions of numerical stability 
are amongst the most interesting, difficult, and useful questions that we can ask about 
fast algorithms for structured linear systems. It is not too hard to invent a new fast 
algorithm, but to find a new stable algorithm is more difficult, and to prove its stability 
or weak stability is a real challenge! 
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